Mavlono Zidan


Channel's geo and language: Uzbekistan, Uzbek
Category: Books


Software Engineering student.Current unemployed but pizzamaker

Related channels

Channel's geo and language
Uzbekistan, Uzbek
Category
Books
Statistics
Posts filter


Liseylarimizni degradatsiyasimi yoki ular kelmayabdimikan


Why in America education illegal?


mahallamizdagi yangiliklar




(2-qism)
def preprocess(data):
inputs = tokenizer(data["context"], data["question"], truncation=True, padding="max_length", max_length=384)

start_positions = []
end_positions = []

for i, answer in enumerate(data["answers"]):
start_positions.append(answer["answer_start"][0])
end_positions.append(answer["answer_start"][0] + len(answer["text"][0]))

inputs["start_positions"] = start_positions
inputs["end_positions"] = end_positions

return inputs
Nima qilinyapti?
1)Matn (context) va savollar (question) tokenizatsiya qilinmoqda.
2)start_positions va end_positions – javob qaysi joyda boshlanib, qayerda tugashini belgilaydi.
4. Ma’lumotlar formatini to‘g‘ri qilish:
dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "start_positions", "end_positions"])
Bu qismda dataset PyTorch formatiga o‘tkazilyapti, chunki Hugging Face Trainer kutubxonasi PyTorch bilan ishlaydi.
5. O‘qitish parametrlari:
output_dir="./quick_model",
per_device_train_batch_size=8,
num_train_epochs=1,
save_steps=500,
evaluation_strategy="no",
fp16=torch.cuda.is_available(),
)
Muammosiz ishlashi uchun 1 ta epoch (1 marta datasetdan o‘tish) ishlatilmoqda.
Agar GPU mavjud bo‘lsa, FP16 tezlashtirish yoqiladi. 6.Modelni o‘qitish:
model=model,
args=training_args,
train_dataset=dataset,
)
trainer.train()
Bu qismda model SQuAD datasetida o‘qitilmoqda.
Trainer klassi avtomatik ravishda ma’lumotlarni yuklaydi, modellarni kuzatadi va optimallashtirishni amalga oshiradi.
Kutilayotgan natija:
✅ Model matndan javoblarni topishni o‘rganadi
✅ 1% ma’lumotda tezroq o‘qitiladi (taxminan 5-10 daqiqa)
✅ O‘qitilgan modelni keyinchalik foydalanish mumkin.
Xulosa:
Bu kod savol-javob (Question Answering) tizimi yaratadi va DistilBERT modelini SQuAD datasetida o‘qitadi.O‘qitish HP Victus noutbuki uchun optimallashtirilgan (kam dataset, 1 epoch).Model o‘qitilgandan keyin, unga yangi savollar berib, javoblarni topish uchun foydalanish mumkin!


from datasets import load_dataset
from transformers import DistilBertTokenizerFast, DistilBertForQuestionAnswering, Trainer, TrainingArguments
import torch
dataset = load_dataset("squad", split="train[:1%]")  # Use only 1% for quick training
model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizerFast.from_pretrained(model_name)
model = DistilBertForQuestionAnswering.from_pretrained(model_name)
def preprocess(data):
    inputs = tokenizer(data["context"], data["question"], truncation=True, padding="max_length", max_length=384)
    start_positions = []
    end_positions = []
   
    for i, answer in enumerate(data["answers"]):
        start_positions.append(answer["answer_start"][0])
        end_positions.append(answer["answer_start"][0] + len(answer["text"][0])) 

    inputs["start_positions"] = start_positions
    inputs["end_positions"] = end_positions

    return inputs
dataset = dataset.map(preprocess, batched=True)
dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "start_positions", "end_positions"])
training_args = TrainingArguments(
    output_dir="./quick_model",
    per_device_train_batch_size=8,
    num_train_epochs=1,  # for my HP Victus only can train with this CPU 1% of sQuAD
    save_steps=500, 
    evaluation_strategy="no",
    fp16=torch.cuda.is_available(),
)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)
trainer.train()
DistilBERT modelidan foydalanib, Savol-Javob (Question Answering - QA) tizimini o‘rgatish.
Bu kod Hugging Face Transformers kutubxonasi yordamida SQuAD (Stanford Question Answering Dataset) ma’lumotlar to‘plamida o‘qitilayotgan DistilBERT modelini yaratadi va uni qisqa vaqt ichida o‘qitadi.
Bu jarayonda:
✅ Ma’lumotlar yuklanadi
✅ Tokenizatsiya qilinadi (so‘zlar sonli kodlarga o‘tkaziladi)
✅ Model tayyorlanadi
✅ O‘qitish boshlanadi                                    

📂 1. Ma’lumotlar to‘plami – SQuAD
SQuAD (Stanford Question Answering Dataset) nima?
Bu savol-javob uchun mo‘ljallangan ma’lumotlar to‘plami bo‘lib, unda:
Matn parchasi (context)
Berilgan savol (question)
Matnda javob bo‘lib keladigan bo‘lak (answer)bor. Modelni shunday o‘qitamizki, u matndan javobni topib berishni o‘rganadi.                                      

`dataset = load_dataset("squad", split="train[:1%]```
SQuAD ma’lumotlar to‘plamining faqat 1% qismi yuklanmoqda, chunki HP Victus noutbukim protsessori faqat kichik datasetda ishlay oladi. Katta butun dataset uchun menda sharoit yo'q.

2. Model va Tokenizatsiya

model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizerFast.from_pretrained(model_name)
model = DistilBertForQuestionAnswering.from_pretrained(model_name)

Bu qismda DistilBERT modeli chaqirilmoqda. U BERT modelining eng yengil (distillatsiya qilingan) versiyasidir.
Tokenizatsiya jarayoni so‘zlarni raqamlarga o‘tkazib, modelga tushunarli qilib beradi.


Eng qizig'i bizni Universitetda AI&Robotics degan major mavjud lekin hali Universitet o'zining shaxsiy GPUsiga ega emas, bunaqa qanday qilib bolalar AI modellarini o'zlari mustaqil train qila olishadi? Yoki shunaqa dengizdan tomchi bo'lgan datasetlar bilan ishlab yuramizmi.


dataset = load_dataset("squad", split="train[:1%]")
1 foiz dataset bilan endi 37 daqiqa


400 hours


Ha bizni klub ham kuchayib ketdi, Red Bull faqat sport va entertainmentga pul ajratmasdan mana, intelektual o'yinlarni ham support qilyabdi.


Qayga qarama couple


borish nasib qisin hammaga, mengayam😂


NUUdagi kutubxonamiz haqida yozmoqchi nu, o'ylasam uyerda bollar-qizlar qiladigan ishni gapirish, etika tarafdan "single" o'g'il bolani xuddiki qogan juftliklarga g'azab, nafratni ifodalab qolgandek bo'lib qolarkan, maktablarda qayerda "date" qilish kerakligi haqidayam menimcha fan qo'shishlari kerak.


Pretty good for first time, but not enough, too far from the destination.




Unix sila


Instagramda "men otneshiniya qimiman" , " men red flagman" , " men qozoqman" , "men o'zbekman" degan temadan z bo'ldi odam.


Qani bir schotingda 10k bo'lsa ubu coin ob qo'ysang hozir, 3-4 yilga.




DeepSeek ishlatgan odam sifatida aytaman bu yana katta bitta burilish, AGIga menimcha yana 2 barobar yaqinlashdik.

20 last posts shown.